Efficiently Bounding Optimal Solutions after Small Data Modification in Large-Scale Empirical Risk Minimization

نویسندگان

  • Hiroyuki Hanada
  • Atsushi Shibagaki
  • Jun Sakuma
  • Ichiro Takeuchi
چکیده

We study large-scale classification problems in changing environments where a small part of the dataset is modified, and the effect of the data modification must be quickly incorporated into the classifier. When the entire dataset is large, even if the amount of the data modification is fairly small, the computational cost of re-training the classifier would be prohibitively large. In this paper, we propose a novel method for efficiently incorporating such a data modification effect into the classifier without actually re-training it. The proposed method provides bounds on the unknown optimal classifier with the cost only proportional to the size of the data modification. We demonstrate through numerical experiments that the proposed method provides sufficiently tight bounds with negligible computational costs, especially when a small part of the dataset is modified in a large-scale classification problem.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Sparse Learning for Large-Scale and High-Dimensional Data: A Randomized Convex-Concave Optimization Approach

In this paper, we develop a randomized algorithm and theory for learning a sparse model from large-scale and high-dimensional data, which is usually formulated as an empirical risk minimization problem with a sparsity-inducing regularizer. Under the assumption that there exists a (approximately) sparse solution with high classification accuracy, we argue that the dual solution is also sparse or...

متن کامل

A heuristic approach for multi-stage sequence-dependent group scheduling problems

We present several heuristic algorithms based on tabu search for solving the multi-stage sequence-dependent group scheduling (SDGS) problem by considering minimization of makespan as the criterion. As the problem is recognized to be strongly NP-hard, several meta (tabu) search-based solution algorithms are developed to efficiently solve industry-size problem instances. Also, two different initi...

متن کامل

A Mathematical Programming Model and Genetic Algorithm for a Multi-Product Single Machine Scheduling Problem with Rework Processes

In this paper, a multi-product single machine scheduling problem with the possibility of producing defected jobs, is considered. We concern rework in the scheduling environment and propose a mixed-integer programming (MIP) model for the problem.  Based on the philosophy of just-in-time production, minimization of the sum of earliness and tardiness costs is taken into account as the objective fu...

متن کامل

Doubly Stochastic Primal-Dual Coordinate Method for Regularized Empirical Risk Minimization with Factorized Data

We proposed a doubly stochastic primal-dual coordinate optimization algorithm for regularized empirical risk minimization that can be formulated as a saddlepoint problem. Different from existing coordinate methods, the proposed method randomly samples both primal and dual coordinates to update solutions, which is a desirable property when applied to data with both a high dimension and a large s...

متن کامل

Accelerated Doubly Stochastic Gradient Algorithm for Large-scale Empirical Risk Minimization

Nowadays, algorithms with fast convergence, small memory footprints, and low per-iteration complexity are particularly favorable for artificial intelligence applications. In this paper, we propose a doubly stochastic algorithm with a novel accelerating multi-momentum technique to solve large scale empirical risk minimization problem for learning tasks. While enjoying a provably superior converg...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1606.00136  شماره 

صفحات  -

تاریخ انتشار 2016